{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "46e5110c-ed35-463e-a9f6-cff9cda6221b",
   "metadata": {},
   "source": [
    "This notebook showcases the insert capabilities of different LlamaIndex data structures.\n",
    "\n",
    "To see how to build the index during initialization, see `TestEssay.ipynb` instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ef7a7a6-dc10-4d94-8bdd-22b4954d365a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# My OpenAI Key\n",
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"INSERT OPENAI KEY\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "708da88f-d1b0-41d2-ad71-773300bf3ec5",
   "metadata": {},
   "source": [
    "## GPT List Insert"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5ff6b625-fb53-4885-8fbb-0fa8dcf6ef57",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### Data Prep\n",
    "Chunk up the data into sub documents that we can insert"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a191e411-f07f-4895-9943-6a8b030abd66",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.langchain_helpers.text_splitter import TokenTextSplitter\n",
    "from llama_index import SimpleDirectoryReader, Document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "0a23c1a8-71ea-4b6d-ae42-5c1cf4014dff",
   "metadata": {},
   "outputs": [],
   "source": [
    "document = SimpleDirectoryReader(\"data\").load_data()[0]\n",
    "text_splitter = TokenTextSplitter(separator=\" \", chunk_size=2048, chunk_overlap=20)\n",
    "text_chunks = text_splitter.split_text(document.text)\n",
    "doc_chunks = [Document(text=t) for t in text_chunks]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "bbdd1b84-d8e4-421a-972c-389a5b0160c6",
   "metadata": {},
   "source": [
    "#### Insert into Index and Query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "d6dcb65d-1a10-471a-8b80-d1bedf7437dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import ListIndex, SimpleDirectoryReader\n",
    "from IPython.display import Markdown, display"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "52c3839a-aa57-412b-b5cf-78c71d6dae3c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# initialize blank list index\n",
    "index = ListIndex([])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3abf2203-54b6-44c3-ac98-97503b18d3ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "# insert new document chunks\n",
    "for doc_chunk in doc_chunks:\n",
    "    index.insert(doc_chunk)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "228ccfb8-00c5-49d1-ba6c-61a757dd76eb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> Starting query: What did the author do growing up?\n"
     ]
    }
   ],
   "source": [
    "# query\n",
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "2580025c-7a5a-419e-84f0-b2ad7b62b6c2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>\n",
       "\n",
       "The author worked on writing and programming and also took classes at Harvard and RISD. The author also worked at a company called Interleaf and wrote a book on Lisp. The author decided to write another book on Lisp and also started a company called Viaweb with the goal of putting art galleries online. The company eventually pivoted to focus on creating software to build online stores. The author also worked on making ecommerce software in the second half of the 90s.\n",
       "\n",
       "The author also worked on building the infrastructure of the web and wrote essays that were published online. The author also worked on spam filters and bought a building in Cambridge to use as an office. The author also had a dinner party every Thursday night.\n",
       "\n",
       "The author also worked on marketing at a Boston VC firm, writing essays, and building the infrastructure of the web. The author also started the Y Combinator program to help fund startups.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "dd41721b-0388-4300-b1cd-75c1013795f4",
   "metadata": {},
   "source": [
    "## GPT Tree Insert"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7d5ecfd2-5913-4484-9619-8825b252406b",
   "metadata": {},
   "source": [
    "#### Data Prep\n",
    "Chunk up the data into sub documents that we can insert"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4d55ffef-a309-40ca-bbcf-231e418b6d4c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.langchain_helpers.text_splitter import TokenTextSplitter\n",
    "from llama_index import SimpleDirectoryReader, Document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2d850fe2-78e3-46de-a78d-4ef12848e41d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# NOTE: we truncate to the first 30 nodes to save on cost\n",
    "document = SimpleDirectoryReader(\"data\").load_data()[0]\n",
    "text_splitter = TokenTextSplitter(separator=\" \", chunk_size=256, chunk_overlap=20)\n",
    "text_chunks = text_splitter.split_text(document.get_text())\n",
    "doc_chunks = [Document(text=t) for t in text_chunks]\n",
    "\n",
    "doc_chunks = doc_chunks[:30]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d4f13125-5627-48f8-9035-e734849dc9da",
   "metadata": {},
   "source": [
    "#### Insert into Index and Query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "4d631ba9-e5fe-46d1-ae10-936a4704c95e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import TreeIndex, SimpleDirectoryReader\n",
    "from IPython.display import Markdown, display"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "54454e62-4df1-482d-978b-807d4a802033",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> Building index from nodes: 0 chunks\n"
     ]
    }
   ],
   "source": [
    "# initialize blank tree index\n",
    "tree_index = TreeIndex([])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "422dde38-5228-499c-a9d3-49b40fd14d34",
   "metadata": {},
   "outputs": [],
   "source": [
    "# insert new document chunks\n",
    "for i, doc_chunk in enumerate(doc_chunks):\n",
    "    print(f\"Inserting {i}/{len(doc_chunks)}\")\n",
    "    tree_index.insert(doc_chunk)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f528cded-7a43-4854-82ce-361a6610bc34",
   "metadata": {},
   "outputs": [],
   "source": [
    "# query\n",
    "query_engine = tree_index.as_query_engine()\n",
    "response_tree = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "c753016d-b21e-47f3-b9f3-a70943f407c5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>The author wrote stories and tried to program computers.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(Markdown(f\"<b>{response_tree}</b>\"))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index",
   "language": "python",
   "name": "llama_index"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
